A Metadata Focused Crawler for Linked Data

نویسندگان

  • Raphael do Vale Amaral Gomes
  • Marco A. Casanova
  • Giseli Rabello Lopes
  • Luiz André P. Paes Leme
چکیده

The Linked Data best practices recommend publishers of triplesets to use well-known ontologies in the triplication process and to link their triplesets with other triplesets. However, despite the fact that extensive lists of open ontologies and triplesets are available, most publishers typically do not adopt those ontologies and link their triplesets only with popular ones, such as DBpedia and Geonames. This paper presents a metadata crawler for Linked Data to assist publishers in the triplification and the linkage processes. The crawler provides publishers with a list of the most suitable ontologies and vocabulary terms for triplification, as well as a list of triplesets that the new tripleset can be most likely linked with. The crawler focuses on specific metadata properties, including subclass of, and returns only metadata, hence the classification “metadata focused crawler”.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach Towards Vertical Search Engines - Intelligent Focused Crawling and Multilingual Semantic Techniques

Search engines typically consist of a crawler which traverses the web retrieving documents and a search frontend which provides the user interface to the acquired information. Focused crawlers refine the crawler by intelligently directing it to predefined topic areas. The evolution of search engines today is expedited by supplying more search capabilities such as a search for metadata as well a...

متن کامل

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

Focused Crawling: A New Approach to Topic-Specific Resource Discovery∗

The rapid growth of the world-wide web poses unprecedented scaling challenges for general-purpose crawlers and search engines. In this paper we describe a new hypertext information management system called a Focused Crawler. The goal of a focused crawler is to selectively seek out pages that are relevant to a pre-defined set of topics. The topics are specified not using keywords, but using exem...

متن کامل

Effective Focused Crawling Based on Content and Link Structure Analysis

A focused crawler traverses the web selecting out relevant pages to a predefined topic and neglecting those out of concern. While surfing the internet it is difficult to deal with irrelevant pages and to predict which links lead to quality pages. In this paper, a technique of effective focused crawling is implemented to improve the quality of web navigation. To check the similarity of web pages...

متن کامل

Kairos: Proactive Harvesting of Research Paper Metadata from Scientific Conference Web Sites

We investigate the automatic harvesting of research paper metadata from recent scholarly events. Our system, Kairos, combines a focused crawler and an information extraction engine, to convert a list of conference websites into a index filled with fields of metadata that correspond to individual papers. Using event date metadata extracted from the conference website, Kairos proactively harvests...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014